SummEval: Re-evaluating Summarization Evaluation
نویسندگان
چکیده
Abstract The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack consensus regarding protocols continue to inhibit progress. We address existing shortcomings methods along five dimensions: 1) we re-evaluate 14 automatic in a consistent fashion using neural model outputs with expert crowd-sourced human annotations; 2) consistently benchmark 23 recent models aforementioned metrics; 3) assemble largest collection summaries generated by trained CNN/DailyMail news dataset share it unified format; 4) implement toolkit that provides an extensible API evaluating across broad range 5) most diverse, terms types, judgments model-generated CNN/Daily Mail annotated both judges crowd-source workers. hope this work will help promote more complete protocol as well advance research developing better correlate judgments.
منابع مشابه
Re-evaluating Automatic Summarization with BLEU and 192 Shades of ROUGE
We provide an analysis of current evaluation methodologies applied to summarization metrics and identify the following areas of concern: (1) movement away from evaluation by correlation with human assessment; (2) omission of important components of human assessment from evaluations, in addition to large numbers of metric variants; (3) absence of methods of significance testing improvements over...
متن کاملMen in nursing: re-evaluating masculinities, re-evaluating gender.
This paper critically interrogates and re-evaluates the notion that it is somehow difficult being a man in nursing and suggests some ways forward which will allow us to gain a more politically astute purchase on gender, nursing and the socio-political context in which the profession operates. Men appear to be well served by a career in nursing. Despite their lesser numbers they are likely to ea...
متن کاملRe-Evaluating the Evaluation — Does Performance Still Matter?
Is your contribution better than the alternatives? This simple-yet-daunting question has killed more research proposals than The Plague. The answer, in the systems community, has historically involved a direct comparison of raw performance. Faster response time, higher throughput, and lower CPI have been good heuristics to arrive at the question’s hard-to-quantify intent: enhancing enduser expe...
متن کاملConclusion : Re - evaluating Eclecticism
When looking at the original data analysis, we see a variety of approaches used to examine the discourse data of focus. The analysis is rich and includes a wide array of features. Conversely, the three single perspective analyses conducted for this Forum each drew upon different linguistic details to support their conclusions with different insights. It is important to consider how different ap...
متن کاملRe-evaluating revascularization
Blood flow in the coronary circulation is normally well regulated and able to meet even strenuous demand. The presence of an epicardial stenosis poses multiple challenges for safeguarding coronary blood flow and myocardial perfusion, which is further exacerbated by the concurrent presence of microvascular disease. The ability to adjust flow adequately during physiological stress is compromised ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2021
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00373